Add bulk apis for pipeline status by harshach · Pull Request #25731 · open-metadata/OpenMetadata

harshach · 2026-02-06T07:51:06Z

Add bulk apis for pipeline status

Describe your changes:

Fixes

I worked on ... because ...

Summary by Gitar

New bulk API endpoint:
- PUT /{fqn}/status/bulk accepts list of PipelineStatus objects, reducing API calls from N to 1 per pipeline
Backend optimization:
- JDBI batch operations with MySQL/PostgreSQL upsert support in PipelineRepository.bulkUpsertPipelineStatuses
- Bulk Elasticsearch indexing via SearchRepository.bulkIndexPipelineExecutions
Databricks connector enhancement:
- yield_pipeline_status now aggregates runs within configurable numberOfStatus window (default: 1 day)
Python SDK additions:
- OMetaBulkPipelineStatus model and add_bulk_pipeline_status method in OMetaPipelineMixin
Integration tests:
- put_bulkPipelineStatus_200_OK validates 5 status submissions with overlap handling
- put_bulkPipelineStatus_invalidTask_4xx verifies task validation

_{This will update automatically on new commits.}

Type of change:

Checklist:

I have read the CONTRIBUTING document.
My PR title is Fixes <issue-number>: <short explanation>
I have commented on my code, particularly in hard-to-understand areas.
For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

github-actions · 2026-02-06T07:54:55Z

TypeScript types have been updated based on the JSON schema changes in the PR

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java

gitar-bot · 2026-02-06T07:57:50Z

ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py

+                if run.start_time and run.start_time < cutoff_ts:
+                    break


⚠️ Edge Case: Databricks connector break assumes descending run order

The yield_pipeline_status method uses break when it encounters a run with start_time < cutoff_ts. This assumes that self.client.get_job_runs() returns runs in strictly descending chronological order (newest first).

If the Databricks API returns runs in a different order (e.g., ascending, or unordered), this break will cause the method to skip recent runs that appear after an older one in the list. This would result in incomplete pipeline status data being ingested.

Suggestion: Either:

Verify and document that get_job_runs() guarantees descending order (Databricks Jobs API does return runs in descending order by default when using list_runs)

Use continue instead of break to safely handle all runs regardless of ordering, though this loses the early-termination optimization

Add a sort before iterating: sorted(runs, key=lambda r: r.get('start_time', 0), reverse=True)

if run.start_time and run.start_time < cutoff_ts: continue # safer: skip old runs without assuming order

_{Was this helpful? React with 👍 / 👎}

github-actions · 2026-02-06T08:01:46Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Vulnerabilities (4)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`libpam-modules`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-modules-bin`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam-runtime`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2
`libpam0g`	CVE-2025-6020	🚨 HIGH	1.5.2-6+deb12u1	1.5.2-6+deb12u2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (18)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`Werkzeug`	CVE-2024-34069	🚨 HIGH	2.2.3	3.0.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.12.12	3.13.3
`aiohttp`	CVE-2025-69223	🚨 HIGH	3.13.2	3.13.3
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6
`azure-core`	CVE-2026-21226	🚨 HIGH	1.37.0	1.38.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	5.3.0	6.1.0
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`protobuf`	CVE-2026-0994	🚨 HIGH	4.25.8	6.33.5, 5.29.6
`pyasn1`	CVE-2026-23490	🚨 HIGH	0.6.1	0.6.2
`python-multipart`	CVE-2026-24486	🚨 HIGH	0.0.20	0.0.22
`ray`	CVE-2025-62593	🔥 CRITICAL	2.47.1	2.52.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

No Vulnerabilities Found

github-actions · 2026-02-06T08:01:51Z

🛡️ TRIVY SCAN RESULT 🛡️

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Java`

Vulnerabilities (33)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.12.7	2.15.0
`com.fasterxml.jackson.core:jackson-core`	CVE-2025-52999	🚨 HIGH	2.13.4	2.15.0
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42003	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4.2
`com.fasterxml.jackson.core:jackson-databind`	CVE-2022-42004	🚨 HIGH	2.12.7	2.12.7.1, 2.13.4
`com.google.code.gson:gson`	CVE-2022-25647	🚨 HIGH	2.2.4	2.8.9
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.3.0	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.3.0	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.3.0	3.25.5, 4.27.5, 4.28.2
`com.google.protobuf:protobuf-java`	CVE-2021-22569	🚨 HIGH	3.7.1	3.16.1, 3.18.2, 3.19.2
`com.google.protobuf:protobuf-java`	CVE-2022-3509	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2022-3510	🚨 HIGH	3.7.1	3.16.3, 3.19.6, 3.20.3, 3.21.7
`com.google.protobuf:protobuf-java`	CVE-2024-7254	🚨 HIGH	3.7.1	3.25.5, 4.27.5, 4.28.2
`com.nimbusds:nimbus-jose-jwt`	CVE-2023-52428	🚨 HIGH	9.8.1	9.37.2
`com.squareup.okhttp3:okhttp`	CVE-2021-0341	🚨 HIGH	3.12.12	4.9.2
`commons-beanutils:commons-beanutils`	CVE-2025-48734	🚨 HIGH	1.9.4	1.11.0
`commons-io:commons-io`	CVE-2024-47554	🚨 HIGH	2.8.0	2.14.0
`dnsjava:dnsjava`	CVE-2024-25638	🚨 HIGH	2.1.7	3.6.0
`io.netty:netty-codec-http2`	CVE-2025-55163	🚨 HIGH	4.1.96.Final	4.2.4.Final, 4.1.124.Final
`io.netty:netty-codec-http2`	GHSA-xpw8-rcwv-8f8p	🚨 HIGH	4.1.96.Final	4.1.100.Final
`io.netty:netty-handler`	CVE-2025-24970	🚨 HIGH	4.1.96.Final	4.1.118.Final
`net.minidev:json-smart`	CVE-2021-31684	🚨 HIGH	1.3.2	1.3.3, 2.4.4
`net.minidev:json-smart`	CVE-2023-1370	🚨 HIGH	1.3.2	2.4.9
`org.apache.avro:avro`	CVE-2024-47561	🔥 CRITICAL	1.7.7	1.11.4
`org.apache.avro:avro`	CVE-2023-39410	🚨 HIGH	1.7.7	1.11.3
`org.apache.derby:derby`	CVE-2022-46337	🔥 CRITICAL	10.14.2.0	10.14.3, 10.15.2.1, 10.16.1.2, 10.17.1.0
`org.apache.ivy:ivy`	CVE-2022-46751	🚨 HIGH	2.5.1	2.5.2
`org.apache.mesos:mesos`	CVE-2018-1330	🚨 HIGH	1.4.3	1.6.0
`org.apache.thrift:libthrift`	CVE-2019-0205	🚨 HIGH	0.12.0	0.13.0
`org.apache.thrift:libthrift`	CVE-2020-13949	🚨 HIGH	0.12.0	0.14.0
`org.apache.zookeeper:zookeeper`	CVE-2023-44981	🔥 CRITICAL	3.6.3	3.7.2, 3.8.3, 3.9.1
`org.eclipse.jetty:jetty-server`	CVE-2024-13009	🚨 HIGH	9.4.56.v20240826	9.4.57.v20241219
`org.lz4:lz4-java`	CVE-2025-12183	🚨 HIGH	1.8.0	1.8.1

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Node.js`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `Python`

Vulnerabilities (8)

Package	Vulnerability ID	Severity	Installed Version	Fixed Version
`apache-airflow`	CVE-2025-68438	🚨 HIGH	3.1.5	3.1.6
`apache-airflow`	CVE-2025-68675	🚨 HIGH	3.1.5	3.1.6
`jaraco.context`	CVE-2026-23949	🚨 HIGH	6.0.1	6.1.0
`starlette`	CVE-2025-62727	🚨 HIGH	0.48.0	0.49.1
`urllib3`	CVE-2025-66418	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2025-66471	🚨 HIGH	1.26.20	2.6.0
`urllib3`	CVE-2026-21441	🚨 HIGH	1.26.20	2.6.3
`wheel`	CVE-2026-24049	🚨 HIGH	0.45.1	0.46.2

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/extended_sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/lineage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_data_aut.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.json`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage.yaml`

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

No Vulnerabilities Found

...ata-service/src/main/java/org/openmetadata/service/resources/pipelines/PipelineResource.java

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java

…h_pipeline_status

gitar-bot · 2026-02-09T17:16:40Z

🔍 CI failure analysis for 4a073fa: Maven SonarCloud CI job failed with 5 Data Product test failures unrelated to this PR's pipeline status API changes.

Issue

Maven SonarCloud CI: 5 Data Product test failures + 1 test error

Root Cause

Unrelated to PR changes - Data Product resource tests are failing with assertion errors. This PR modifies pipeline status APIs, not Data Product functionality.

Details

Failed tests: All in DataProductResourceTest

testDataProductBulkOutputPorts - expected: success but was: failure
testGetPortsByNameEndpoints - expected: 1 but was: 0
testGetPortsViewEndpoint - expected: 1 but was: 0
testGetOutputPortsReturnsFullEntities - expected: 1 but was: 0
testDataProductDomainMigrationWithInputOutputPorts - Output port should be in target domain after migration (domain UUID mismatch)
testDataProductBulkPortsViaApi - test error

Test results:

Job 62999935598 (maven-sonarcloud-ci)
7919 tests run, 5 failures, 1 error, 701 skipped
Duration: ~3.5 hours

Why unrelated to PR:

This PR changes pipeline status ingestion APIs (Java backend + Python connectors)
Modified files: PipelineRepository.java, PipelineResource.java, databrickspipeline/metadata.py, pipeline_status.py, SQL migrations
No changes to Data Product functionality - These are separate domains
Test failures are in Data Product bulk operations, port endpoints, and domain migration
The PR doesn't touch Data Product repository, resource classes, or domain migration logic
All failures show business logic assertion mismatches (status codes, counts, domain IDs), not compilation or integration issues

Code Review ⚠️ Changes requested 7 resolved / 13 findings

Bulk pipeline status API is well-structured with good test coverage. All 6 previous findings remain unresolved — most notably the missing reindexAcrossIndices call in the bulk path and the MySQL migration data loss risk for rows with missing $.timestamp.

⚠️

Edge Case: Databricks connector break assumes descending run order

📄 ingestion/src/metadata/ingestion/source/pipeline/databrickspipeline/metadata.py:268-269

The yield_pipeline_status method uses break when it encounters a run with start_time < cutoff_ts. This assumes that self.client.get_job_runs() returns runs in strictly descending chronological order (newest first).

If the Databricks API returns runs in a different order (e.g., ascending, or unordered), this break will cause the method to skip recent runs that appear after an older one in the list. This would result in incomplete pipeline status data being ingested.

Suggestion: Either:

Verify and document that get_job_runs() guarantees descending order (Databricks Jobs API does return runs in descending order by default when using list_runs)
Use continue instead of break to safely handle all runs regardless of ordering, though this loses the early-termination optimization
Add a sort before iterating: sorted(runs, key=lambda r: r.get('start_time', 0), reverse=True)

if run.start_time and run.start_time < cutoff_ts:
    continue  # safer: skip old runs without assuming order

⚠️

Bug: MySQL migration: data loss risk during column drop/re-add

📄 bootstrap/sql/migrations/native/1.11.8/mysql/schemaChanges.sql:32-36

The MySQL migration drops the timestamp column and re-adds it in a single ALTER TABLE statement. While MySQL can handle this atomically within one DDL statement, dropping a VIRTUAL generated column and re-adding it as STORED forces a full table rebuild on entity_extension_time_series. On large deployments, this table can contain millions of rows (pipeline statuses, test results, profiler data, etc.), which means:

Table lock duration: The ALTER TABLE will hold a metadata lock for the entire rebuild, blocking all reads/writes to this table.
Downtime risk: Any concurrent pipeline status writes or reads during migration will be blocked.

Consider using pt-online-schema-change or gh-ost for large deployments, or at minimum, document the expected downtime/lock duration in the migration comments so operators can plan accordingly. Also consider adding a note about the migration's impact in the PR description since this is a high-traffic table.

⚠️

Edge Case: Bulk DB insert not atomic with validation/ES indexing

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:419

In addBulkPipelineStatus, the flow is:

Validate tasks (lines 389-396)
Bulk upsert to DB (line 405)
Bulk index to ES (line 407)
Store RDF (line 409)
Update pipeline entity index (line 419)

If step 3, 4, or 5 throws an exception, the DB records from step 2 have already been committed (JDBI useHandle auto-commits). Unlike the single-status path which wraps ES indexing in a try-catch (lines 360-365), the bulk path's bulkIndexPipelineExecutions already has its own try-catch internally, which is good. However, the searchRepository.updateEntityIndex(pipeline) call at line 419 is NOT wrapped in a try-catch. If it throws, the response will be an error even though the DB upsert succeeded, leaving the client uncertain about the operation's status.

Suggested fix: Wrap the entity index update in a try-catch to match the pattern used for ES execution indexing:

try {
    searchRepository.updateEntityIndex(pipeline);
} catch (Exception e) {
    LOG.error("Failed to update pipeline entity index in Elasticsearch", e);
}

⚠️

Bug: Bulk status path missing reindexAcrossIndices call

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:419-423 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:419

The single-status addPipelineStatus method (line 371) calls reindexAcrossIndices("upstreamLineage.pipeline.fullyQualifiedName", ...) to update downstream lineage entities in the search index when a pipeline's status changes. The bulk path addBulkPipelineStatus (lines 419-423) only calls searchRepository.updateEntityIndex(pipeline) but omits the reindexAcrossIndices call entirely.

This means entities downstream of the pipeline (e.g., tables with lineage from this pipeline) won't have their search index updated when bulk statuses are ingested, causing stale upstreamLineage data in search results.

Suggested fix: Add the same reindexAcrossIndices call after searchRepository.updateEntityIndex(pipeline):

try {
    searchRepository.updateEntityIndex(pipeline);
    searchRepository.getSearchClient()
        .reindexAcrossIndices(
            "upstreamLineage.pipeline.fullyQualifiedName", pipeline.getEntityReference());
} catch (Exception e) {
    LOG.error("Failed to update pipeline entity index in Elasticsearch", e);
}

💡 Quality: Redundant pipeline lookup when statuses list is empty

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:377-384

In addBulkPipelineStatus, when the list is null/empty, the method fetches the pipeline and returns it (lines 377-380). Then if the list is non-empty, it fetches the pipeline again (lines 383-384). This means in the normal (non-empty) path, the pipeline is fetched once, but the empty-list guard creates an unnecessary code path that returns ENTITY_UPDATED even though nothing was updated.

Consider either:

Returning Response.Status.NO_CONTENT for empty lists, or
Moving the pipeline fetch before the check and reusing it:

Pipeline pipeline = daoCollection.pipelineDAO().findEntityByName(fqn);
pipeline.setService(getContainer(pipeline.getId()));
if (pipelineStatuses == null || pipelineStatuses.isEmpty()) {
    return new RestUtil.PutResponse<>(Response.Status.OK, pipeline, ENTITY_NO_CHANGE);
}

💡 Edge Case: Bulk status doesn't check if latest exceeds existing status

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:395-400

The addBulkPipelineStatus method finds the latest status among the submitted batch and sets it as the pipeline's pipelineStatus. However, it doesn't check whether the pipeline already has a more recent status from a previous call. If someone submits a batch of older statuses after a newer one was already set, the pipeline's pipelineStatus would regress to an older timestamp.

For comparison, the existing single-status addPipelineStatus also doesn't guard against this, so this may be by-design. But with bulk operations, the risk is higher since a batch of historical data is a common use case.

Suggestion: Consider comparing the batch's latest timestamp against the pipeline's current pipelineStatus timestamp before overwriting.

✅ 7 resolved

✅ Edge Case: Bulk task validation NPE when taskStatus is null

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java:390-395
In addBulkPipelineStatus, the inner loop iterates over pipelineStatus.getTaskStatus() without null checking. If a PipelineStatus is submitted with no taskStatus field (which is valid per the schema — the field is optional), getTaskStatus() may return null, causing a NullPointerException.

The single-status path at line 326 has the same pattern, so this is a pre-existing pattern, but introducing it in a bulk API amplifies the impact — a single null taskStatus in a batch of 1000 would fail the entire batch after potentially partial DB writes.

Suggested fix:
for (PipelineStatus pipelineStatus : pipelineStatuses) {
    for (Status taskStatus : listOrEmpty(pipelineStatus.getTaskStatus())) {
        if (validatedTasks.add(taskStatus.getName())) {
            validateTask(pipeline, taskStatus.getName());
        }
    }
}
The listOrEmpty utility is already imported in this file.

✅ Bug: Generated TS uses numberOfStatus but schema defines statusLookbackDays

📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json:85-90 📄 openmetadata-ui/src/main/resources/ui/src/generated/metadataIngestion/pipelineServiceMetadataPipeline.ts:47-51
The JSON schema at pipelineServiceMetadataPipeline.json defines the field as statusLookbackDays, and the Python Databricks connector correctly references self.source_config.statusLookbackDays. However, all four generated TypeScript files define the property as numberOfStatus instead:

createIngestionPipeline.ts line 634

ingestionPipeline.ts line 1263

pipelineServiceMetadataPipeline.ts line 51

workflow.ts line 5115

This means the frontend TypeScript code would use numberOfStatus while the backend Python code uses statusLookbackDays, creating a naming mismatch. The TS codegen appears to have been run against a different version of the schema (possibly before the rename from numberOfStatus to statusLookbackDays).

Fix: Regenerate the TypeScript types from the current JSON schema so the field is named statusLookbackDays consistently across all layers.

✅ Quality: Dead code: EntityTimeSeriesDAO upsert/upsertBatch added but unused

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesDAO.java:92-97
The EntityTimeSeriesDAO interface gains three new methods: upsert(table, ...), upsert(entityFQN, ...), and upsertBatch(...). However, none of these are used anywhere in the codebase. The PipelineRepository.bulkUpsertPipelineStatuses method uses raw JDBI batch operations with Entity.getJdbi().useHandle(...) instead of going through the DAO.

This creates two issues:

Dead code that adds maintenance burden

Two competing approaches to the same operation (DAO method vs. raw JDBI)

Suggestion: Either remove the unused DAO methods, or refactor PipelineRepository to use them. Note that upsertBatch currently iterates one-by-one (defeating the batch purpose), so if you keep it, consider making it truly batch.

✅ Edge Case: Bulk API endpoint has no payload size limit

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/pipelines/PipelineResource.java:523
The addBulkPipelineStatus REST endpoint at /{fqn}/status/bulk accepts a List<PipelineStatus> with no upper bound on the number of items. A caller could submit thousands or millions of status records in a single request, which would:

Cause an oversized JDBI batch operation that may exhaust memory or hit DB timeout limits

Create a proportionally large Elasticsearch bulk index request

Block the request thread for an extended period

Consider adding a validation check at the API layer to cap the list size (e.g., 1000 items), or at minimum add a @Size(max=N) annotation on the parameter:
@Valid @Size(max = 1000) List<PipelineStatus> pipelineStatuses
Alternatively, enforce it in PipelineRepository.addBulkPipelineStatus with an early validation check.

✅ Quality: Misleading config name: numberOfStatus actually means days

📄 openmetadata-spec/src/main/resources/json/schema/metadataIngestion/pipelineServiceMetadataPipeline.json:85-90
The JSON schema field numberOfStatus has the description "Number of days of pipeline run status history to ingest" — it represents a number of days, not a number of statuses. The name is misleading and may confuse users into thinking it controls the count of status records to ingest.

In the Databricks connector, it's used as:
lookback_days = self.source_config.numberOfStatus or 1
cutoff_ts = int((datetime.now(timezone.utc) - timedelta(days=lookback_days)).timestamp() * 1000)
Suggestion: Rename to numberOfDays or statusLookbackDays to accurately reflect the semantics. If numberOfStatus is an existing field being repurposed, document the semantic change.

...and 2 more resolved from earlier reviews

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

`Auto-apply`	`Compact`
`gitar auto-apply:on`	`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-02-09T17:48:26Z

Quality Gate passed for 'open-metadata-ui'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-02-10T12:45:17Z

Failed to cherry-pick changes to the 1.11.9 branch.
Please cherry-pick the changes manually.
You can find more details here.

* Add bulk apis for pipeline status * Update generated TypeScript types * Fix gitar comments * Update generated TypeScript types * Fix pycheck * Address comments * Fix databricks test * Move schema changes to 1.11.9 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: harshsoni2024 <harshsoni2024@gmail.com>

…and add missing EntityServiceBase APIs

Add bulk apis for pipeline status

5ee37f1

harshach requested review from a team as code owners February 6, 2026 07:51

harshach had a problem deploying to test February 6, 2026 07:51 — with GitHub Actions Failure

harshach temporarily deployed to test February 6, 2026 07:51 — with GitHub Actions Inactive

github-actions bot added backend safe to test Add this label to run secure Github workflows on PRs labels Feb 6, 2026

Update generated TypeScript types

6036cee

github-actions bot requested a review from a team as a code owner February 6, 2026 07:54

gitar-bot bot reviewed Feb 6, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java Show resolved Hide resolved

gitar-bot bot reviewed Feb 6, 2026

View reviewed changes

...ata-service/src/main/java/org/openmetadata/service/resources/pipelines/PipelineResource.java Outdated Show resolved Hide resolved

gitar-bot bot reviewed Feb 6, 2026

View reviewed changes

openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/PipelineRepository.java Show resolved Hide resolved

ulixius9 previously approved these changes Feb 6, 2026

View reviewed changes

harshach added the To release Will cherry-pick this PR into the release branch label Feb 6, 2026

harshach had a problem deploying to test February 6, 2026 17:59 — with GitHub Actions Failure

harshach added 2 commits February 6, 2026 15:57

Fix gitar comments

b1b7c68

Merge remote-tracking branch 'origin/batch_pipeline_status' into batc…

18dac89

…h_pipeline_status

Merge branch 'main' into batch_pipeline_status

5a5afb1

harshsoni2024 temporarily deployed to test February 9, 2026 14:21 — with GitHub Actions Inactive

harshsoni2024 had a problem deploying to test February 9, 2026 14:21 — with GitHub Actions Failure

harshsoni2024 temporarily deployed to test February 9, 2026 14:21 — with GitHub Actions Inactive

harshach added 2 commits February 9, 2026 08:47

Merge remote-tracking branch 'origin/main' into batch_pipeline_status

e65d10f

Fix databricks test

1415ff3

harshach had a problem deploying to test February 9, 2026 16:57 — with GitHub Actions Failure

harshach had a problem deploying to test February 9, 2026 16:57 — with GitHub Actions Error

harshach had a problem deploying to test February 9, 2026 16:57 — with GitHub Actions Failure

harshach had a problem deploying to test February 9, 2026 16:57 — with GitHub Actions Error

Move schema changes to 1.11.9

4a073fa

harshach temporarily deployed to test February 9, 2026 17:15 — with GitHub Actions Inactive

harshach had a problem deploying to test February 9, 2026 17:15 — with GitHub Actions Failure

harshach temporarily deployed to test February 9, 2026 17:15 — with GitHub Actions Inactive

ulixius9 approved these changes Feb 10, 2026

View reviewed changes

ulixius9 merged commit b244798 into main Feb 10, 2026
33 of 37 checks passed

ulixius9 deleted the batch_pipeline_status branch February 10, 2026 12:44

manerow added a commit that referenced this pull request Feb 10, 2026

Fix cherry-pick #25731 build errors: restore PipelineService methods …

a0b3a1e

…and add missing EntityServiceBase APIs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulk apis for pipeline status#25731

Add bulk apis for pipeline status#25731
ulixius9 merged 18 commits intomainfrom
batch_pipeline_status

harshach commented Feb 6, 2026 •

edited by gitar-bot bot

Loading

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Uh oh!

gitar-bot bot Feb 6, 2026

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

gitar-bot bot commented Feb 9, 2026 •

edited

Loading

Issue

Root Cause

Details

Uh oh!

sonarqubecloud bot commented Feb 9, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

harshach commented Feb 6, 2026 • edited by gitar-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add bulk apis for pipeline status

Describe your changes:

Summary by Gitar

Type of change:

Checklist:

Uh oh!

github-actions bot commented Feb 6, 2026

Uh oh!

Uh oh!

gitar-bot bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion:trivy (debian 12.12)

Vulnerabilities (4)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (18)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO

No Vulnerabilities Found

Uh oh!

github-actions bot commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ TRIVY SCAN RESULT 🛡️

Target: openmetadata-ingestion-base-slim:trivy (debian 12.13)

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Java

Vulnerabilities (33)

🛡️ TRIVY SCAN RESULT 🛡️

Target: Node.js

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: Python

Vulnerabilities (8)

🛡️ TRIVY SCAN RESULT 🛡️

Target: /etc/ssl/private/ssl-cert-snakeoil.key

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/extended_sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/lineage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_data_aut.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.json

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage.yaml

No Vulnerabilities Found

🛡️ TRIVY SCAN RESULT 🛡️

Target: /ingestion/pipelines/sample_usage_aut.yaml

No Vulnerabilities Found

Uh oh!

Uh oh!

Uh oh!

harshach commented Feb 6, 2026 •

edited by gitar-bot bot

Loading

github-actions bot commented Feb 6, 2026 •

edited

Loading

Target: `openmetadata-ingestion:trivy (debian 12.12)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/home/airflow/openmetadata-airflow-apis/openmetadata_managed_apis.egg-info/PKG-INFO`

github-actions bot commented Feb 6, 2026 •

edited

Loading

Target: `openmetadata-ingestion-base-slim:trivy (debian 12.13)`

Target: `Java`

Target: `Node.js`

Target: `Python`

Target: `/etc/ssl/private/ssl-cert-snakeoil.key`

Target: `/ingestion/pipelines/extended_sample_data.yaml`

Target: `/ingestion/pipelines/lineage.yaml`

Target: `/ingestion/pipelines/sample_data.json`

Target: `/ingestion/pipelines/sample_data.yaml`

Target: `/ingestion/pipelines/sample_data_aut.yaml`

Target: `/ingestion/pipelines/sample_usage.json`

Target: `/ingestion/pipelines/sample_usage.yaml`

Target: `/ingestion/pipelines/sample_usage_aut.yaml`

gitar-bot bot commented Feb 9, 2026 •

edited

Loading